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ABSTRACT 

The mammalian thymine DNA glycosylase (TDG) is 
implicated in active DNA demethylation via the base 
excision repair pathway. TDG excises the mis- 
matched base from G:X mismatches, where X is 
uracil, thymine or 5-hydroxymethyluracil (5hmU). 
These are, respectively, the deamination products 
of cytosine, 5-methylcytosine (5mC) and 5-hydroxy- 
methylcytosine (5hmC). In addition, TDG excises the 
Tet protein products 5-formylcytosine (5fC) and 
5-carboxylcytosine (5caC) but not 5hmC and 5mC, 
when paired with a guanine. Here we present a 
post-reactive complex structure of the human TDG 
domain with a 28-base pair DNA containing a 
G:5hmll mismatch. TDG flips the target nucleotide 
from the double-stranded DNA, cleaves the 
A/-glycosidic bond and leaves the C1 hydrolyzed 
abasic sugar in the flipped state. The cleaved 
5hmll base remains in a binding pocket of the 
enzyme. TDG allows hydrogen-bonding interactions 
to both T/U-based (5hmU) and C-based (5caC) modi- 
fications, thus enabling its activity on a wider range 
of substrates. We further show that the TDG catalytic 
domain has higher activity for 5caC at a lower pH 
(5.5) as compared to the activities at higher pH (7.5 
and 8.0) and that the structurally related Escherichia 
coii mismatch uracil glycosylase can excise 5caC as 
well. We discuss several possible mechanisms, 
including the amino-imino tautomerization of the 
substrate base that may explain how TDG discrimin- 
ates against 5hmC and 5mC. 



INTRODUCTION 

Mammalian DNA cytosine modification is a dynamic 
process and occurs by converting cytosine (C) to 
5-methylcytosine (5mC), established by specific DNA 
methyltransferases, and then to 5-hydroxymethylcytosine 
(5hmC or H) by ten-eleven-translocation (Tet) proteins 
(1^). Tet proteins can further oxidize 5hmC to 
5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) (5,6). 
The genomic 5fC and 5caC contents are very low [5-10 fmol 
(5)] compared to hundreds of pmols of 5hmC present in many 
tissues and cell types examined (1). In addition, the content of 
5-hydroxymethyluracil (5hmU), the deamination product of 
5hmC, is also relatively low [<3.5pmol (1)]. These data 
suggest that modification products of 5hmC are either 
produced rarely or are short-lived possibly because of 
removal by subsequent enzymatic reactions. 

The mammalian thymine DNA glycosylase (TDG) has 
been proposed to be involved in active DNA demethy- 
lation through the removal of deamination products of 
5mC or its oxidized derivatives by the base excision 
repair pathway (7-9). Consistent with this role, the 
activation-induced deaminase (AID), a DNA-cytosine 
deaminase, is reported to be required to demethylate 
pluripotency genes during reprogramming of the somatic 
genome in embryonic stem cell fusions (10), and AID- 
deficient animals are less efficient in erasure of DNA 
methylation in primordial germ cells (11). Additionally, 
another member of the AID/APOBEC family (12), 
APOBEC3A, is more efficient at 5mC deamination than 
AID (13). High expression of a member of the AID/ 
APOBEC family may promote 5mC deamination, 
creating a T:G mismatch (14,15), or 5hmC deamination, 
producing a 5hmU:G mismatch (8), which would be 
subject to excision by TDG (9,16) (Figure 1). 
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Figure 1. A putative pathway of DNA demethylation involving DNA methylation by DNMTs, hydroxylation by Tet proteins, deamination by 
members of APOBEC superfamily, and base excision by TDG linked to base excision repair (BER). In addition, eMUG can excise 5caC as well (see 
Figure 2). DNA major groove and minor groove sides are indicated. Horizontal small arrows indicate the hydrogen bond donors and acceptors for 
5caC and 5hmU bases, (a) C, 5mC and its oxidized derivatives (5hmC, 5fC and 5caC) form base pairs with an opposite G. (b) Deamination-linked 
mismatches. 



Also of particular interest are recent reports that TDG 
can excise 5fC and 5caC (but not 5hmC and 5mC) from 
DNA (6,17). This new specificity of TDG suggests a 
deamination-independent active DNA demethylation 
pathway through Tet-mediated oxidation of 5hmC 
(Figure 1). Here, we explore the structural and biochem- 
ical basis of TDG excision of 5hmU (a deamination 
product) and 5caC (a Tet-mediated oxidation production). 

Human TDG catalytic domain (residues 111-308) has 
been crystallized with an abasic analog (tetrahydrofuran) 
within a 22-bp DNA with one S'-overhanging adenine or 
thymine (22+1 bp) (18). We initially followed this pub- 
lished crystallization procedure (18) and collected a 
complete dataset at 4.0 A resolution of the catalytic 
mutant N140A in complex with the 22+1 -bp DNA 



containing a G:5caC site (Supplementary Figure SI). 
The diffraction quality of these type of P6 5 crystals 
(containing two TDG molecules and one DNA duplex) 
varied significantly and required screening of many 
crystals to achieve ~3.0A resolution (18). During the 
course of the study, structures were reported for the 
same TDG fragment in complex with the same 22+1 -bp 
DNA containing either an A:5caC mismatch or a modified 
5caC (with a 2 / -fluoro substitution on the deoxyribose of 
5caC) paired with G (19). Both crystals diffracted asym- 
metrically to 3 A along the a and b axes and 4 A along the 
c axis (19). 

Here we focus on the structural study of the TDG 
domain in complex with DNA containing a G:5hmU 
mismatch and compare the structure to that of TDG 



Nucleic Acids Research, 2012, Vol 40, No. 20 10205 



with a 2 / -deoxy-2 / -fluoroarabinouridine (20). We also in- 
vestigate the biochemical properties of TDG on the 
G:5caC substrate and compare them to that of a TDG- 
related mismatch-specific uracil glycosylase (MUG) from 
Escherichia coli. 

MATERIALS AND METHODS 

Expression and purification of TDG 

Human TDG residues 1 1 1-308 (pXC1056) and its mutants 
(see below) were expressed using the pET28b vector as 
described (18). The proteins were expressed in E. coli 
BL21(DE3)-Gold cells with the RIL-Codon plus plasmid 
(Stratagene). Cultures were grown at 37°C until the OD 60 o 
reached 0.5; at that point the temperature was shifted to 
16°C, and isopropyl (3-D-l-thiogalactopyranoside (IPTG) 
was added to 0.4 mM to induce expression. Cells were 
re-suspended with a 4x volume of 300 mM NaCl, 
20 mM sodium phosphate, pH 7.4, 20 mM imidazole, 
ImM dithiothreitol (DTT) and 0.25 mM phenylmethyl- 
sulphonyl fluoride and sonicated for 5min (1 s on and 
2 s off). The lysate was clarified by centrifugation twice 
at 38 000 g for 30min. Hexahistidine fusion protein was 
isolated on a nickel-charged chelating column (GE 
Healthcare). The His 6 tag was removed by adding 50 
Units of thrombin to the imidazole eluate from the Ni 
column and incubated for 16 h at 4°C, leaving six extra- 
neous N- terminal amino acids (GSHMAS). The cleaved 
protein was further purified by collection of flow through 
of a HiTrap SP column (GE-Healthcare) and 
concentrated. The concentrated protein was then loaded 
onto a Superdex 75 (16/60) column (equilibrated with 100 
mM NaCl, 20 mM HEPES, pH 7.0, 1 mM DTT) where it 
eluted as a single peak corresponding to a monomeric 
protein. The purification of eMUG has been described 
previously (21). 

Mutagenesis 

The following mutants were generated by PCR mutagen- 
esis, expressed and purified similar to the wild-type protein: 
single point mutations of N140A (pXC1057), N140D 
(pXC1105), A145S (pXC1156), N157A (pXC1155), 
S200A (pXC1112), K201A (pXC1120), N230D 
(pXC1113), S271A (pXC1114), S271H (pXC1122) and 
quadruple mutant of P198-G199-S200-K201 to A AAA 
(pXC1123). 

Crystallography of TDG and DNA complexes 

For co-crystallization with the 28-bp DNA, 0.35 mM of 
TDG wild-type protein was mixed with 0.2 mM of 
annealed oligonucleotide (synthesized by the New 
England Biolabs, Inc.): 5-CAG CTC TGT ACG TGA 
GCG ATG GAC AGC T-3' and 5'-AGC TGT CCA 
TCG CTC A XG TAC AGA GCT G-3' where X is 
5hmU. The 5-hydroxymethyl-deoxyU phosphoramidites 
used for DNA synthesis were purchased from Glen 
Research. Crystals appeared within 24 h under the condi- 
tions of 30% polyethylene glycol (PEG) 4000, 0.2 M 
ammonium acetate, 0.1 M sodium acetate, pH 4.6. 



Crystals were cryoprotected by soaking in mother liquor 
supplemented with 20% ethylene glycol. X-ray diffraction 
datasets were collected at the SER-CAT beamline 
(22ID-D) at the Advanced Photon Source, Argonne 
National Laboratory and processed using HKL2000 
(22). The structures were solved by molecular replacement 
by PHENIX (23) using TDG and the abasic DNA 
complex structure [PDB 2RBA (18)] as the search 
model. Electron density for DNA was easily interpretable, 
and its model was built using the programs O and 
Coot (24). PHENIX refinement scripts were used for 
refinement, and the statistics shown in Supplementary 
Table 1 were calculated for the entire resolution range. 
The Rf rQQ and 7? wor k values were calculated for 5% 
(randomly selected) and 95%, respectively, of observed 
reflections. 

DNA glycosylase activity assay 

TDG activity assays were performed using various oligo- 
nucleotides labeled with 6-carboxy-fluorescein (FAM) and 
monitoring the excision of the target base by denaturing 
gel electrophoresis following NaOH hydrolysis of the 
abasic site (Supplementary Figure S2). TDG protein 
(0.5 uM) and an equal amount of double-stranded 
FAM-labeled 32-bp duplexes were mixed in 20(iL 
nicking buffer (10 mM Tris-HCl, pH 8.0, ImM EDTA, 
0.1% BSA) and incubated at 37°C for 30min: (FAM)- 
y-TCG GAT GTT GTG GGT CAG XGC ATG ATA 
GTG TA-3' (where X = C, 5mC, 5hmC, U, T, 5hmU or 
5caC) and 5'-TAC ACT ATC ATG CGC TGA CCC 
ACA ACA TCC GA-3'. 

The reactions were stopped by adding 2jiL of 1 N 
NaOH and by boiling for lOmin. Twenty microliters 
of loading buffer (98% formamide, ImM EDTA and 
1 mg/ml of Bromophenol Blue and Xylene Cyanole) 
were added, and the reaction mixtures were boiled 
for another lOmin. Samples were immediately cooled in 
ice water and loaded onto a 10x10 cm 2 15% denaturing 
gel containing 7 M urea, 24% formamide, 15% acryl- 
amide and lx Tris-Borate-EDTA (TBE). The gels 
were run in lx TBE buffer for 60min at 200 V. FAM- 
labeled single-stranded DNA was visualized under UV 
exposure. 

For enzymatic reactions under single turnover condi- 
tion, the FAM-labeled 32-bp duplexes (250 nM) and 
10-fold excess of TDG catalytic domain or eMUG 
(2.5 uM) were incubated for 0-30 min (for G:5caC at 
37°C or room temperature) or 0-5 min (for G:U at 4°C) 
in 0.1% BSA and 1 mM EDTA buffered by a mixture of 
15mM citric acid, 30 mM BisTris-propane and 15mM 
(cyclohexylamino)ethanesulfonic acid (CHES) adjusted 
to the indicated pH. The intensities of the FAM-labeled 
DNA were determined by Typhoon Trio+ (GE 
Healthcare) and quantified by the image-processing 
program Image J (NIH). The data were fitted to nonlinear 
regression using software GraphPad PRISM 5.0d 
(GraphPad Software Inc.): [Product] = P max (l - e~ kt ), 
where P max is the product plateau level, k is the 
observed rate constant and t is the reaction time. 
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DNA binding assay 

TDG protein (1.0 pM) and 0.5 uM of 32-bp-FAM labeled 
DNA (same as above) were mixed in 20- uL nicking buffer 
(10 mM Tris-HCl, pH 8.0, 1 mM EDTA, 0.1% BSA) and 
incubated at 37°C for 15min. Samples were loaded onto a 
10x10 cm 2 10% native polyacrylamide gel in lx 
Tris-Borate-EDTA (TBE) buffer and ran for 40min at 
100 V. 



RESULTS 

TDG forms a stable complex with specific DNA 

We first measured the glycosylase and binding activities of 
the TDG fragment (residues 111-308) using various 
32-base-pair (bp) DNA oligonucleotides, each containing 
a single modified base X (X = C, M, H, U, T, 5hmU or 
5caC) within a G:X pair in a CpG sequence. As expected, 
glycosylase activity was not observed with TDG on oligo- 
nucleotides bearing the 'natural' G:C, G:M and G:H base 
pairs, while substrates bearing G:T, G:U and G:5hmU 
mismatches were efficiently cleaved (Figure 2a). 
Therefore, TDG is capable of acting on deamination 
generated products. In addition, TDG is capable of 
excising 5caC when paired with a guanine, which presum- 
ably preserves Watson-Crick base-pair hydrogen bonds 
(Figure la). A catalytic mutant (N140A) is inactive on 
all substrates under the conditions of pH 8.0 at 37°C for 
30min (Figure 2a), as reported previously (25,26). The 
glycosylase activity correlates well with the ability to 
form a specific complex in electrophoretic mobility-shift 
assay (Figure 2b), under the condition of 2:1 molar ratio 
of enzyme to DNA. 

Escherichia coli mismatch uracil glycosylase (eMUG) is 
related to human TDG both in sequence (27) (31% 
identity and 43% similarity; Figure 2c) and in structure 
(28). Under the same single turnover conditions used for 
TDG, eMUG excises G:5caC in addition to G:U 
mismatch (Figure 2d), albeit much more slowly. We did 
not detect eMUG activity on oligonucleotides bearing 
G:T and G:5hmU mismatches, in agreement with a 
previous report that eMUG has 10 4 -10 5 fold reduced 
activity on T and 5hmU compared to U (29). 

Crystallization of TDG catalytic domain bound 
with 28-bp DNA 

To improve the resolution of X-ray diffraction, we varied 
the lengths of oligonucleotides used for crystallization. 
Only after we used a 28-bp oligonucleotide were we able 
to consistently grow crystals that formed in space group 
C2 and which reached the modest resolutions of 2.3 to 2.6 
Angstroms. We report here the TDG structure in complex 
with the 28-bp DNA containing a G:5hmU mismatch at 
2.5 A resolution (Supplementary Table SI). We observed 
electron density for TDG residues 1 1 1-305 and all 28 base 
pairs of DNA (Figure 3a and Supplementary Figure S3). 
TDG flips the target nucleotide 5hmU from the double- 
stranded DNA, cleaves the 7V-glycosidic bond and leaves 
the abasic sugar in the flipped state. The cleaved 5hmU 
base remains in a binding pocket of TDG. We will first 



describe the overall structure and then the detailed inter- 
actions involving 5hmU. 

Overall structure of the 1:1 TDG-DNA complex 

Unlike the structure of TDG with the 22+1 -bp DNA that 
consisted of two protein molecules per DNA (18), our 
crystals with the 28-bp DNA contain a single 
protein-DNA complex at 1:1 ratio per crystallographic 
asymmetric unit (even under crystallization conditions of 
~2:1 ratio of protein to DNA), confirming that the func- 
tional reaction complex involves a TDG monomer (30). 
The protein component is highly similar to that of the 
previous structure (18), with a root mean squared devi- 
ation of ~0.6A when comparing the 182 pairs of Coc 
atoms (residues 123-304). One significant difference is 
that despite the fact that we used the same length 
protein, our new structure revealed an additional 
N- terminal helix (ocA; residues 115-122), which extends 
toward the DNA minor groove together with the 
N- terminal tail (Figure 3b). 

The majority of the protein-DNA interactions in the 
two structures are also very similar, except for the 
flipped out target base in our new structures. Briefly, 
TDG makes phosphate contacts spanning five base pairs 
but mostly on the phosphates surrounding the modified 
nucleotide (two 5'- and three 3 / -phosphate groups; 
summarized in Figure 3c). The backbone of the DNA 
strand containing the modified base is firmly gripped by 
the loop containing Arg275 and residues P198-G199-S200 
followed by a 3 io helix, approaching from opposite direc- 
tions (major and minor grooves, respectively) (Figure 3d). 
The side chain of Arg275 penetrates into the DNA helix 
from the minor groove, occupying the space left by the 
flipped-out modified nucleotide (Figure 3e). The positively 
charged guanidino group of Arg275 electrostatically 
interacts with the phosphate group immediately 3' to 
the modified base, and the S'-phosphate two bases away 
(— 1 and +2 phosphate groups in Figure 3c). 

An 'intercalation' loop containing Arg275 (amino acids 
270-281) also contains residues interacting with both 
guanines of the CpG dinucleotide. The intrahelical 
orphaned guanine hydrogen bonds with the main 
chain carbonyl oxygen atoms of Ala274 and Pro280 
(Figure 3f). The exocyclic N2 atom of the guanine of the 
neighboring G:C pair hydrogen bonds with Gln278 from 
the minor groove (Figure 3g). This TDG-guanine inter- 
action ensures the specificity of base excision to be within 
the CpG dinucleotide (18). No interaction to the neigh- 
boring G:C pair in the major groove side is observed, 
suggesting that the modification status at C5 of the Cyt 
in the complementary strand (methyl, hydroxymethyl or 
carboxyl) would have no impact on TDG activity (31). In 
addition, Ala277 intercalates between the Cyt and Gua 
bases of the unmodified strand, resulting in an ~25° 
kink (Figure 3h). 

Structure of a post-reactive complex: TDG with 
a cleaved 5hmU base 

In the TDG structure complexed with G:5hmU pair, the 
7V-glycosidic bond of the modified nucleotide is cleaved 
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Figure 2. Base excision and binding activities of TDG catalytic domain in the context of a double-stranded CpG dinucleotide. (a) Double-stranded 
32-bp oligonucleotides bearing a single CpG dinucleotide were incubated with equal amount of the glycosylase domain of TDG or its noncatalytic 
mutant N140A at 37°C for 30 min. The oligonucleotide was labeled with FAM on the top strand, and the modification status was indicated 
(M = 5mC and H = 5hmC). The products of the reactions were separated on a denaturing polyacrylamide gel, and the FAM-labeled strand was 
excited by UV and photographed, (b) DNA binding assays were performed by incubating 0.5 jiM FAM-labeled oligonucleotides with 1 uM of TDG 
at 37°C for 15 min. (c) Pairwise sequence alignment of human TDG domain (top line) and E. coli MUG (bottom line). Secondary structural elements 
are shown above or below the aligned sequences. White-on-black residues are invariant between the two sequences examined, while gray-highlighted 
positions are conserved (R and K, E and D, Q and N, T and S, F, Y and W, V, I, L and M, and G and P). Positions highlighted by * are active site 
residues responsible for catalysis (Asnl40) and/or proposed for substrate base recognition (only two of them, Asnl40 and Asnl57, are invariant 
between human TDG and E. coli MUG), (d) eMUG is active on G:U and G:5caC substrates (top panel). Reactions were performed at room 
temperature (approximately 22° C) for 30 min with [E eMUG ] = [S DNA ] = 5|iM. The kinetic activities of eMUG on G:U substrates at 4°C (bottom left 
panel) and G:5caC at room temperature (approximately 22° C) (bottom right panel) were measured under single turnover condition 
([E e MUG] = 2.5 jiM and [S DNA ] = 0.25 jiM) at three different pH values (5.5 in red, 7.5 in orange and 8.0 in blue curves). 



leaving the abasic sugar bound in the active site 
surrounded by Ilel39, Glyl99, Asnl40 and Glyl42 
(Figure 4a and b). There is an extra electron density 
near the CY atom indicating the existence of an attached 
hydroxyl oxygen (Figure 3 a insert) and suggesting that the 
hydroxylation already occurred (see below). The catalytic 
residue Asnl40 directly approaches the sugar ring with its 
main-chain and side-chain carbonyl oxygen atoms inter- 
acting with the hydroxyl oxygen attached to the CI' of the 
abasic sugar ring. Similar to the structures of uracil DNA 
glycosylase (32,33), the cleaved base of 5hmU remains 
bound in a cage-like pocket via hydrophobic interactions 
that involve stacking face to face with the abasic sugar 
ring and face to edge with the aromatic ring of Tyrl52 
(Figure 4b), suggesting a tight binding of the abasic 
reaction product (26,30,34-36). Such tight binding by 
the glycosylase (which lacks AP lyase activity) protects 
the abasic site from nonspecific processing until subse- 
quent repair activities are recruited to the lesion site and 
TDG sumoylation facilitates enzymatic turnover (37). 

The Watson-Crick polar edge of the cleaved 5hmU 
base is involved in interactions with the side-chain 
hydroxyl oxygen atom of Ser271 and the main-chain 
carbonyl oxygen atom of Asn230 (via the 02 atom), the 
main-chain amide nitrogen atom of Del 39 (via the N3 
atom) and the side-chain amino group of Asnl91 (via 
the 04 atom) (Figure 4c). The hydroxyl oxygen atom of 
5hmU interacts directly with the main-chain amide 



nitrogen of Glyl42 (Supplementary Figure S2c) and 
water-mediated contacts with Ala 145 and the S'-phos- 
phate group of the abasic site (Figure 4c). We note that 
the interaction between Del 39 and the ring N3 atom does 
not form an ideal hydrogen bond because both the amide 
nitrogen atom and the N3 atom under normal physio- 
logical conditions carry a hydrogen atom. 

The fact that TDG retains its cleaved base in a cage-like 
pocket is due to the lack of an open cleft through which the 
cleaved base can diffuse out to solvent. Since human TDG 
has a broad substrate spectrum for base modifications 
opposite guanine (16,17,38), we reasoned that the cage 
should be able to accommodate a variety of cleaved 
bases via the stacking interactions between the abasic 
sugar and Tyrl52 (a conserved residues among members 
of TDG family; Supplementary Figure S4), whereas the 
interactions with Ser271 and Asn230 (to 02), Del 39 
(to N3) and Asnl91 (to N4), in the cage are potentially 
applicable to C-based modifications, for example, 5caC 
(Supplementary Figure S2d). We also note that residues 
whose side chains point to the cage, Ser271, Ser272, 
Hisl51, Asnl57, Asnl91 and Asn230, can all act as 
either hydrogen bond donor and/or acceptor (Figure 4d), 
allowing flexibility in accommodating a variety of 
bases. On the other hand, a network of polar interactions 
(Figure 4e) is critical for stabilizing the active site, 
including Tyrl52 (stacking with the target base), Asn230 
(interacting with 02), Ser272 (whose main chain and side 
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Figure 3. Structure of TDG in complex with G:5hmU containing DNA. (a) 2Fo-Fc electron density, contoured at la above the mean, for the entire 
28-bp DNA used in the TDG structure determination. The insert is an enlarged abasic sugar with a hydrolyzed CV. (b) Overall structure of the WT 
TDG complex. DNA is in stick model, and TDG is in ribbon model. The Arg275-containing loop is colored in magenta, the P-G-S loop is in cyan 
and catalytic loop in blue, (c) Summary of the TDG-DNA interactions: mc, main-chain-atom-mediated contacts; black boxes represent the CpG 
sequence and extrahelical 5hmU. (d) The Arg275-containing intercalation loop (in magenta) and the P-G-S loop (in cyan) approach the modified 
DNA strand from opposite directions, (e) Arg275 penetrates into the DNA helix from the minor groove, (f) The three hydrogen bonds formed 
with the intrahelical orphaned guanine, (g) Gln278 forms a hydrogen bond from the minor groove side with Gua of the adjoining G:C base pair, 
(h) Ala277 intercalates between the central Cyt and Gua of the unmodified strand. 



chain polar atoms are saturated with interactions) and 
Asnl57 (interacting with S'-phosphate of the flipped nu- 
cleotide). Mutation of Asn230 to its corresponding aspar- 
tate (N230D) produced a mutant with greatly reduced 
activity. Whereas G:U and G:5caC were still processed, 
the mutant has residual activity on G:5hmU and lost the 
catalytic activity on G:T substrates (Figure 4e). 



Structural comparison to a pre-reaction complex 

We compared our post-reaction complex structure with 
that of a pre-reactive complex containing 2 / -deoxy-2 / - 
fluoroarabinouridine, a mimic of deoxyuridine that is 
not cleaved by TDG (20). One of the key observations 
revealed by the pre-reaction complex was a putative 
nucleophilic water molecule, held in position by the 
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Figure 4. The binding of 5hmU in the active site, (a) In the WT structure, the TV-glycosidic bond of the extrahelical nucleotide is cleaved, and a 
hydroxyl oxygen atom has been attached to the CT of the sugar ring (see Figure 3a inset), (b) Favorable face-to-face and edge-to-face hydrophobic 
interactions between the sugar, the cleaved 5hmU base and Tyrl52. (c) Omit electron density, contoured at 3.5<r above the mean, is shown for 
omitting 5hmU. The hydrogen bond interactions (dashed lines) with the polar atoms of 5hmU are within 3.0 A distance cutoff: mc, 
main-chain-atom-mediated contacts, (d) The 5hmU-binding pocket is rich in polar atoms, (e) A hydrogen-bonding network involves both side 
chain and main chain atoms of depicted residues. The activity of N230D is shown. 



side-chain carbonyl oxygen atom of Asnl40 and the 
backbone carbonyl oxygen of Thrl97 (20) (Figure 5a). 
No such water molecule was found in the corresponding 
position of the post-reactive complex where the CY hy- 
droxylation already occurred (Figure 3 a insert). 
Superimposition of the pre- and post-reaction complex 
structures revealed the attack trajectory by the water 
molecule at the CY occurs from the opposite side of the 
leaving base (Figure 5b). Other structurally characterized 
DNA 7V-glycosylases have an acidic residue in the active 
site coordinating the proposed nucleophilic water 
molecule [Asp 145 of human uracil DNA glycosylase (39) 
and Asp 144 of Bacillus stearothermophilus MutY (40)]. 
We mutated Asnl40 of TDG to the corresponding carb- 
oxylate (N140D), reasoning that the attack of water could 
benefit from the base catalysis provided by the carboxylate 
group of N140D. However, N140D only has measurable 
but reduced activity on G:U, G:5hmU and G:5caC and no 
detectable activity on G:T substrate (Figure 5c). N140D 



probably disrupts the hydrogen bond between the side- 
chain amino group of Asnl40 and the backbone 
carbonyl oxygen atom of Argl95 (Figure 5a) and thus 
destabilizes the catalytic loop conformation. 

Superimpositions of a normal intrahelical thymine 
with the post- and pre-reactive complexes, respectively 
(Figure 5d and e), suggest that the everted yet uncleaved 
base rotates around the glycosidic bond as well as bends 
relative to the sugar ring. Once the Nl-CT glycosidic 
bond is cleaved, the base is further rotated (Figure 5f) 
and moved ~4.6A to the position in the post-reactive 
complex where the 02 interacts with Ser271 (Figure 5g). 
The corresponding position of Ser271 is an asparagine in 
eMUG (Figure 2c) or a histidine in the UDG superfamily, 
including human uracil DNA glycosylase (hUNG) (41) 
(Supplementary Figure S5a). Since eMUG, like TDG, 
has activity on 5caC (Figure 2d), whereas hUNG ex- 
hibited no activity toward 5caC (6), we mutated Ser271 
to histidine (S271H) in addition to alanine (S271A). 
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Figure 5. Comparison of post- and pre-reactive complex structures, (a, b) Superimposition of the post-reactive complex (in color) and the 
pre-reactive complex [in gray; PDB 3UFJ (20)] shows a putative nucleophilic water molecule, held in position by the side chain carbonyl oxygen 
atom of Asnl40 in the pre-reactive complex, attacks the CY from the opposite position of the leaving base, generating the Cl'-hydrolyzed abasic 
sugar as shown in the post-reactive complex (in yellow), (c) The activities of TDG mutants N140D, S271A, S271H and A145S. (d, e) Superimposition 
of a normal intrahelical thymine (colored in magenta) onto the post-reactive complex (panel d) or the pre-reactive complex (panel e) suggests a bent 
relative to the sugar ring and a rotation around the glycosidic bond, (f, g) Superimposition of the post-reactive complex (in color) and the 
pre-reactive complex (in gray) suggests the base undergoes another rotation after cleavage (panel f) and moves towards Ser271 (panel g). (h, i) 
Superimposition of a 5hmU base (in yellow) onto the flipped uracil in the pre-reactive complex (PDB 3UFJ). 



However, both mutations do not affect substrate specifi- 
city (Figure 5c), probably because the Ser271-02 inter- 
action occurs only after the cleavage. 

Finally, we superimposed a 5hmU base onto the flipped 
uracil base in the pre-reactive complex (Figure 5h and i). 
The 5-hydroxymethyl group could fit the space between 
Alal45 and Prol53 (Figure 5h) with the hydroxyl oxygen 
interacting with one of the S'-phosphate oxygen atoms 
(Figure 5i). We mutated Ala 145 to serine (A145S) because 
the corresponding residue in eMUG is a serine (Figure 2c). 
The side chain of Ser23 of eMUG is directed toward the 
5-position of the flipped uracil, and it was suggested that 
the Ser23 would lower the efficiency of thymine excision 
(with a methyl group at the 5-position) but not prevent it 
(42). However, the TDG A145S mutation does not affect 
substrate specificity (Figure 5c), in agreement with a 
previous mutational analysis that A145S mutation has no 



effect on processing G:T substrate (26). This is probably 
because the A145S side-chain hydroxyl oxygen could 
rotate away from the substrate to accommodate various 
modifications at the 5-position. 

Interestingly, in the pre-reactive TDG complex, the 
exocyclic 02 oxygen atom of the flipped uracil base is 
~3.5A away from the main-chain amide nitrogen atoms 
of Del 39 and Asnl40 as well as the side-chain carbonyl 
oxygen atom of Asnl91 (20) (Figure 5i). The Asnl91 
side-chain carbonyl oxygen atom is also ~3.5 A from the 
proton-bearing ring N3 atom (20). The N191A mutation 
causes decreased TDG activities for G:U and G:T 
(20). We note that a simple rotation around the side-chain 
X2 torsion angle of Asnl91 would allow the side-chain 
amino group to form a hydrogen bond with the 
un-protonated N3 atom of Cyt (or its derivatives) 
(Supplementary Figure SI). Furthermore, the interaction 
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of exocyclic 04 of uracil with the main-chain amide 
nitrogen of Tyrl52 (20) (Figure 5i) is converted to the 
interaction of exocyclic N4 (NH 2 ) of 5caC with the side 
chain of Hisl51 (19) (Supplementary Figure Sib). Thus, 
the TDG active site provides interactions to the polar 
edges of a uracil (and its derivatives 5hmU and thymine) 
as well as 5caC (all substrates of TDG). 

The effect of TV-glycosidic bond stability on catalysis 

The rather tolerant nature of the TDG active site raises 
the question as to why 5hmC is not a substrate for TDG 
while 5caC is a substrate. If we suppose that 5hmC was 
flipped into the active site similarly to 5caC, it would have 
identical interactions with TDG along the Watson-Crick 
edge, and perhaps one hydrogen bond less than 5caC at 
the C5 position. Such a small difference seems unlikely to 
be the sole reason why 5hmC is not a substrate and, 
moreover, 5hmU is a substrate despite the presence of 
C5 substitution (Figure 2a). 

Rather than selective base recognition or an inability of 
TDG to completely flip 5hmC (or 5mC or even C) into its 
active site, an alternative explanation has been suggested 
for the specificity of TDG that is attributed to the reactivity 
of the 7V-glycosidic bond (17,43) as estimated by electronic 
substituent constant (cr m ) (44) of the C5 substituent. TDG 
has greater activity for C (and U) analogs with an 
electron- withdrawing C5 substituent (cr m > 0), such that 
TDG can rapidly excise 5fC (with an a m value of 0.35) but 
is not active against 5mC (with an a m value of —0.07) and 
5hmC (with a m = 0) (17). Interestingly, the a m value for the 
deprotonated state (COO - ) of 5caC is —0.10 (even lower 
than that of 5mC) but 0.37 for the protonated form 
(COOH)(44). Although the robust TDG activity on 5caC 
at pH 7.5 (17) or pH 8.0 (Figure la) is not predicted, given 
that the carboxylate group should be deprotonated at those 
pH values, yielding a negative a m value, we investigated 
whether lowering the pH would enhance TDG activity on 
5caC as a result of increasing protonation of the carboxylate 
group. TDG activity on the G:U substrate is known to be 
relatively constant for pH 5.5-9 (45). Accordingly, we 
measured TDG activities on both G:U and G:5caC sub- 
strates under single turnover conditions (i.e. [E TDG ] >> 
[Sdna]) at three different pH values of 5.5, 7.5 and 8.0. To 
eliminate the effect of buffer on activity, we used a mixture 
of citric acid, 1 ,3-bis(tris(hydroxymethyl)methylamino) 
propane (Bis-tris propane) and 2-CHES that was adjusted 
to each pH. We find that the TDG catalytic domain has 
higher activity (by a factor of 5 and 9.5, respectively) for 
theG:5caCatpH5.5(/c obs = 1.9 min -1 ), as compared to the 
activities at pH 7.5 (k ohs = 0.4 min -1 ) and 8.0 (k obs = 0.2 
min -1 ) (Figure 6a). Although the activity for the G:U sub- 
strate is also higher at pH 5.5 (by approximately a factor of 
2) than that at higher pH (45) (Figure 6b), the more 
dramatic enhancement of activity on G:5caC at lower pH 
suggests that the chemical nature of the target nucleotide, as 
predicted by the <r m value of the 5-position substituent, can 
contribute to TDG being active on 5caC, but that alone 
does not fully account for the activity at neutral or higher 
pH. Other, more specific, interactions with the target base 
are likely involved, particularly knowing the fact that a 



family of the plant DNA glycosylases is capable of 
excising 5mC (46) and 5hmC (Supplementary Figure S6), 
albeit slowly (47). 

DISCUSSION 

Impact of base pairing and glycosidic torsion angles 
on catalysis 

Several factors may affect the ability of the target nucleotide 
being flipped from the intrahelical to a fully extrahelical 
position in the active site. One factor to consider is 
whether various oxidation states of 5mC derivatives 
perturb the stability of DNA duplex differently, as 
base-pairing dynamics have a critical role in allowing 
hUNG to capture a spontaneously extruded lesion (41). 
The melting temperatures (T m ) of oligonucleotides con- 
taining G:C, G:5mC, G:5hmC, G:5fC or G:5caC are 
within 1°C of each other, suggesting no significant 
differences among the base-paring properties of 5mC de- 
rivatives (48,49). The hydrogen bond energy of the 
Watson-Crick-type base pair of G:5caC was theoretically 
estimated to be —30 kcal/mol, while the naturally occurring 
G:C base pair shows the hydrogen bond energy of —27 kcal/ 
mol (50), suggesting that the modified base pair was more 
stable than the canonical base pair by 3 kcal/mol. 

The second possible factor to consider is whether 
oxidized 5mC derivatives affect the DNA backbone con- 
formation, as MutM distinguishes a target 8-oxoG from 
G via the sugar pucker conformation in the DNA 
backbone (51). This happens because the 8-oxo substitu- 
tion causes a steric clash with the sugar while rotating the 
glycosidic bond torsion angle. However, the 5-position 
substituents would not cause steric clash with sugar 
while rotating the glycosidic bond. In summary, while 
the structure of the substrate provides some clues to the 
substrate specificity of TDG, it does not fully explain the 
lack of activity against 5hmC. 

An interrogation along the multi-step flipping pathway 

Extensive studies on DNA glycosylase enzymes, such as 
hUNG and 8-oxoguanine DNA glycosylases (hOGG and 
bacterial MutM) [reviewed in (39,52)], showed that they 
recognize damaged bases through a multi-step interroga- 
tion process. The enzymes distort DNA by bending it 
followed by intrahelical interrogation to detect a lesion, 
flipping of potential substrate nucleotides to varying 
degrees and rejection of non-substrate nucleotide back 
to DNA helices, allowing only a true substrate to reach 
the active site. Our DNA binding data show that the TDG 
catalytic domain binds significantly more weakly to C, 
5mC and 5hmC than to substrate bases including 5caC, 
suggesting a discrimination step before stable (perhaps 
flipped) complex formation (Figure 2b). In the TDG- 
DNA post-reactive complex examined here, we did not 
observe any protein side-chain interaction in the major 
groove of DNA where the modifications at the C5 
position are positioned. However, two side chains of 
TDG, Lys201 of the 3i 0 helix (whose side chain density 
is disordered) and Ser200 of the P-G-S loop (whose side 
chain forms an interaction with the 3 / -phosphate group of 
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the flipped nucleotide; Figure 3c) are located in the major 
groove and could potentially form interactions with C5 
modifications (Figure 3d) while scanning along the DNA 
major groove via an active intrahelical interrogation and 
extrusion mechanism as proposed for MutM (51). In 
addition, the P-G-S loop of TDG can be superimposed 
well with the corresponding loop of hUNG involved in 
examination of a partially flipped thymine in the search 
for uracil in DNA (41) (Supplementary Figure S5b and c). 
However, the TDG mutants of S200A, K201A or PGSK 
to AAAA neither affect TDG substrate specificity nor 
activity under the conditions tested (Supplementary 
Figure S5d). These results suggest that the P-G-S loop is 
unlikely to play a strong role in discriminating between 
different 5-substituents. 

The potential effect of amino/imino tautomeric 
forms on catalysis 

A strong intramolecular hydrogen bond has been observed 
between the exocyclic N4 amino group and the carbonyl 
oxygen at C5 of 5fC in the free nucleoside form (49,53). 
It was hypothesized that the existence of such a hydrogen 
bond would shift the amino-imino equilibrium (54,55), 
which would enable 5fC to form two, instead of three, 



hydrogen bonds with an opposite G (Figure 7), equivalent 
to a G:T or G:5hmU 'wobble' pair (Figure lb). Previously 
observed mutagenic potential of 5fC in cells (54,56) sug- 
gested the possible existence of the imino tautomeric 
form, which could result in the mutagenic incorporation 
of an adenine opposite of the 5fC during DNA replication. 
Indeed a small amount (1-2%) of adenine incorporation 
was observed in DNA polymerase reactions in vitro 
(49,55). In a more recent study, the discrimination of GTP 
over ATP is reduced by a factor of ~30 for 5fC template in 
comparison with C template during in vitro RNA polymer- 
ase II transcription (57). 

We speculate that TDG might take advantage of the 
tendency of G:5fC and G:5caC, both of which contain a 
5-position carbonyl oxygen, to form a mismatch-like 
wobble hydrogen bonding pattern (Figure 7) and turn 
them into substrates. Thus, the common theme of TDG 
substrates could be the ability to form wobble pairs. This 
wobble geometry might be important for TDG in the 
initial recognition of the substrate pair. After flipping, 
TDG is able to hold a variety of bases in the active site 
pocket. The enzyme-DNA complex seen in the gel retard- 
ation assay (Figure 2b) could be either the substrate in the 
wobble pairing conformation or the product complex 




Figure 6. The activity of TDG catalytic domain as a function of pH. The activity of TDG catalytic domain on (a) G:5caC at 37° C and (b) G:U 
substrates at 4°C under single turnover condition ([E TDG ] = 2.5 uM and [S DN a] = 0.25 uM) at three different pH values. The reaction on G:U 
substrate was measured at 4°C because the reaction was complete within 1 min at 37° C or room temperature (~22°C) (data not shown). 
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Figure 7. Amino-imino tautomerization. 5fC and 5caC exhibit an intramolecular hydrogen bond that could shift the amino/imino equilibrium 
toward the imino tautomeric form which would then base pair with guanine in a mismatch-like wobble pattern. 
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(E»P) not the initial substrate complex (E»S), which is a 
possible reason that there is no gel shift with 5hmC and 
5mC. Additionally, the observed increase in the reaction 
rate at pH 5.5 (Figure 6a) is consistent with increased 
protonation of N3 required for tautomerization. Finally, 
in light of the fact that plant ROS1 is capable of excision 
of both 5mC and 5hmC bases when paired with a guanine 
(Supplementary Figure S6), the mechanism of substrate 
recognition by this glycosylase must be different from 
that of TDG and eMUG. This difference in the two 
enzyme families has yet to be studied structurally, 
(bio)chemically and thermodynamically. 

ACCESSION NUMBERS 

Protein Data Bank: The coordinates and structure factors 
of the post-reactive complex of human TDG domain 
with 5hmU have been deposited with accession number 
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